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In longitudinal and spatial studies, observations often demon- 
strate strong correlations that are stationary in time or distance lags, 
and the times or locations of these data being sampled may not be ho- 
mogeneous. We propose a nonparametric estimator of the correlation 
function in such data, using kernel methods. We develop a pointwise 
asymptotic normal distribution for the proposed estimator, when the 
number of subjects is fixed and the number of vectors or functions 
within each subject goes to infinity. Based on the asymptotic theory, 
we propose a weighted block bootstrapping method for making infer- 
ences about the correlation function, where the weights account for 
the inhomogeneity of the distribution of the times or locations. The 
method is applied to a data set from a colon carcinogenesis study, 
in which colonic crypts were sampled from a piece of colon segment 
from each of the 12 rats in the experiment and the expression level 
of p27, an important cell cycle protein, was then measured for each 
cell within the sampled crypts. A simulation study is also provided 
to illustrate the numerical performance of the proposed method. 

1. Introduction. This paper concerns kernel-based nonparametric esti- 
mation of covariance and correlation functions. Our methods and theory are 
applicable to longitudinal and spatial data as well as time series data, where 
observations within the same subject at different time points or locations 
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have strong correlations, which are stationary in time or distance lags. The 
structure for the observation at a particular time or location within one 
subject can be very general, for example, a vector or even a function. 

Our study arises from a colon carcinogenesis experiment. The biomarker 
that we are interested in is p27, which is a life cycle protein that affects cell 
apoptosis, proliferation and differentiation. An important goal of the study 
is to understand the function of p27 in the early stage of the cancer develop- 
ment process. In the experiment, 12 rats were administered azoxymethane 
(AOM), which is a colon specific carcinogen. After 24 hours, the rats were 
terminated and a segment of colon tissue was excised from each rat. About 
20 colonic crypts were randomly picked along a linear slice on the colon 
segment. The physical distances between the crypts were measured. Then, 
within each crypt, we measured cells at different depths within the crypts, 
and then the expression level of p27 was measured for each cell within the 
chosen crypts. In this data set, crypts are naturally functional data (Ramsay 
and Silverman [13]), in that the responses within a crypt are coordinated 
by cell depths. There is a literature about similar data, for example, Morris 
et al. [11]. 

However, in this paper we will be focused on a very different perspective. 
In this application the spatial correlation between crypts is of biological 
interest, because it helps answer the question: if we observe a crypt with 
high p27 expression, how likely are the neighboring crypts to have high p27 
expression? We will phrase much of our discussion in terms of this example, 
but as seen in later sections, we have a quite general structure that includes 
time series as a special case. In that context, the asymptotic theory is as the 
number of "time series locations," that is, crypts, increases to infinity. 

Although motivated by a very specific problem, nonparametric covari- 
ance/correlation estimators are worth being investigated in their own right. 
They can be used in a statistical analysis as: (a) an exploratory device to 
help formulate a parametric model, (b) an intermediate tool to do spatial 
prediction (kriging), (c) a diagnostic for parametric models and (d) a robust 
tool to test correlation. Understanding the theoretical properties of the non- 
parametric estimator is important under any of these situations. A limiting 
distribution theory would be especially valuable for purpose (d). 

There is previous work on the subject of nonparametric covariance esti- 
mation. Hall, Fisher and Hoffmann [7] developed an asymptotic convergence 
rate of a kernel covariance estimator in a time series setting. They required 
not only an increasing time domain, but increasingly denser observations. 
Diggle and Verbyla [5] suggested a kernel-weighted local linear regression 
estimator for estimating the nonstationary variogram in longitudinal data, 
without developing asymptotic theory. Guan, Sherman and Calvin [6] used 
a kernel variogram estimator when assessing isotropy in geostatistics data. 
They proved asymptotic normality for their kernel variogram estimator in a 
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geostatistics setting, where they required the spatial locations to be sampled 
from the field according to a two-dimensional homogeneous Poisson process. 

As we will show below and as implied by the result from Guan, Sherman 
and Calvin [6], if the observations locations (or times) in the design are 
random, Hall's assumption, namely that the number of observations on a 
unit domain goes to infinity, is too restrictive and not necessary. However, in 
the setting of Guan, Sherman and Calvin [6], given the sample size, spatial 
locations are uniformly distributed within the field, which does not fit our 
problem, where crypt locations within a rat are, in fact, not even close to 
uniformly distributed. 

Our paper differs from the previous work on the kernel covariance estima- 
tors in the following ways. First, our approach accommodates more complex 
data structure at each location or time. Second, we allow the spatial loca- 
tions to be sampled in an inhomogeneous way, and as we will show below 
this inhomogeneity will affect the asymptotic results and inference proce- 
dures. In doing so, we generalize the setting of Guan, Sherman and Calvin 
[6], and link it to the setting of Hall, Fisher and Hoffmann [7]. Also, Guan, 
Sherman and Calvin [6] is mainly concerned with comparing variograms on 
a few preselected distance lags; we, on the other hand, are more interested 
in the correlation as a function. Third, we propose an inference procedure 
based upon our theory, thus filling a gap in the previous literature. 

The paper is organized as follows. Section 2 introduces our model as- 
sumptions and estimators, while asymptotic results are given in Section 3. 
An analysis of the motivating data is given in Section 4, where we also dis- 
cuss bandwidth selection and standard error estimation. Section 5 describes 
simulation studies, and final comments are given in Section 6. All proofs are 
given in the Appendix. 

2. Model assumptions and estimators. The data considered here have 
the following structure. 

• There are r = 1,...,R independent subjects, which in our example are 
rats. 

• The data for each subject have two levels. The first level has an increasing 
domain, as in time series or spatial statistics, and are the crypts in our 
example. We label this first level as a "unit," and it is these units that have 
time series or spatial structure in their locations. Within each subject, 
there are i = 1, . . . , A',, such units. 

• The second level of the data consists of observations within each of the 
primary units. In our case, these are the cells within the primary units, 
the colonic crypts. We will label this secondary level as the "subunits," 
which are labeled with locations. The locations with the subunits are 
on the interval [0,1]. For simplicity, we will assume there are exactly m 
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subunits (cells) within each unit (crypt), with the jth subunit having 
location (relative cell depth) x = (j — l)/(m — 1). However, all theories 
and methods in our paper will go through if the subunits take the form 
of an arbitrary finite set. 
• For 711 = 1, define x to be fixed at 0. It is analogous to the time series 
setting of Hall, Fisher and Hoffman [7] or the spatial setting of Guan, 
Sherman and Calvin [6]. 

Let Q{s,x) be a random field on T x X, where s is the unit (crypt) 
location and x is the subunit (cell) location, so that T= [0, oo), X = {{j — 
l)/(m — 1), j = 1, . . . , m}. Assume that 0r(-, •), r = I, . . . ,R, are independent 
realizations of ©(•,•). We use the short-hand notation Qri{x) = @r{Sri,x), 
where Sri is the location of the ith unit (crypt) within the rth subject (rat). 
Our model for the observed data is that 



(1) Yrij — &ri{xj^ -\- £ri 



where Y is the response (logarithm of p27 level), £rij are zero- mean uncor- 
related measurement errors with variance o"^, r = 1, . . . , R, i = 1, . . . , Nr and 
j = 1, . . . ,m are the indices for subjects (rats), units (crypts) and subunits 
(cells). Define ^'r(') = -Er{©rj(')} to ^^e the subject-level mean, and the no- 
tation refers to expectation conditional on the subject. Another way to 
understand ^'r(-) is to decompose the random field Qri', •) into the random 
effect model @ri{x) = "ifrix) + Ari{x), where ^'r(') is the fixed subject effect 
and Aj-i is the zero-mean, spatially correlated unit effect. 

Within each subject, we assume that the correlation of the mean unit 
(crypt) level functions is stationary over the distances between the units. 
In addition, the covariance between unit locations (si,S2) at subunit (cell) 
locations {xi,X2) is assumed to have the form 

(2) V{X1,X2,A} = E[{@r{si,Xi) -^r{xi)}{er{s2,X2) -^r{x2)}], 

where A = si — S2. While we develop general results for model (2), in many 
cases it is reasonable to assume that the covariance function is separable, 
that is, 

(3) V(xi,X2,A) = G(xi,X2)p(A). 

When the covariance function is separable, the correlation function at the 
unit-level, p(-) , is of interest in itself. In our application, p(-) is the correlation 
between crypts. We provide an estimator of p{-) as well as an asymptotic 
theory for that estimator. 

A first estimator for the covariance function has the form 



V{xj,xi,A) 



E E E Kh{Ar{i, k) - A}(y„,- - Yr.jWrkl " Yr-l) 
. r i kj^i 
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(4) 



EEE^MA.(.,A:)-A} 



where F,.,- = iY^' E^I^i ^^ij, A,(i,fc) = 5^ - Srk and Kh{-) = h-^K{-/h) 
with K being a kernel function satisfying the conditions in Section 3. 

It is usually reasonable to assume that V(a;i,X2,A) has some symmetry 
property, that it is an even function in A and V(xi,a;2,A) = V(x2,xi,A). 
However, the estimator defined in (4) does not enjoy this property. To see 
this, we observe that, for Xj ^ xi, although {Yrij — Yr.j){Yrki — Yr-i) and 
O^rii — Yr-i)iYrkj — ^r-j) estimate the same thing, they only contribute to 
V{xj,xi,A) and V{xj,xi, — A), respectively. We also observe that V(a;i, X2, A) 
V{x2,xi, -A). 

To correct the asymmetry of the covariance estimator, for A > 0, define 



V{xj,xuA) 



(5) 



EEE^'^il^-(^'^)l - - Yr.jWrkl - Yr.l) 



Y.Y.Y.I^h{\Ar{iM-^} 



and let V{xj,xi,A) = V{xj,xi,—A) for A < 0. As shown in the proof of 
Theorem 2, for a fixed A 7^ 0, V(a;i,X2,A) is asymptotically equivalent to 
{V(xi,X2,A) + V(xi,X2,-A)}/2. 

In addition, when the separable structure (3) is assumed, define the esti- 
mator for the within-unit covariance as 

(6) G(xi,X2) = V(xi,X2,0), 

and the estimator for the correlation function as 



(7) p{A) = { Y: E V(xi,X2,A) / E E G(xi,X2) 



<xi£X X2<xi 



^xiGX X2<xi 



3. Asymptotic results. The following are our model assumptions. Each 
subject (rat) is of length L, where in our example L is the length of the seg- 
ment of tissue from each rat. The units (crypts) are located on the interval 
[0,L], and in our asymptotics we let L — > 00, so that we have an increas- 
ing domain. Suppose that the positions of the units (crypts) within the rth 
subject (rat) are 5^1, . . . ,SrNr^ where the S^s are points from an inhomo- 
geneous Poisson process on [0,L]. Then Aj- j^ = Sri — Srk- The definition of 
an inhomogeneous Poisson process is adopted from Cressie [3]. We assume 
the inhomogeneous Poisson process has a local intensity ug*{s)^ where 1/ is a 
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positive constant and g*{s) = g{s/L) for a continuous density function g{-) 



A special case of our setting is that g{-) is a uniform density function 
and the units (crypts) are sampled according to a homogeneous Poisson 
process. This is the setting investigated in Guan, Sherman and Calvin [6]. 
Our setting resembles that of Hall, Fisher and Hoffmann [7] in the sense 
that we also model the unit locations as random variables with the same 
distribution: in our setting, the number of units within a subject (rat) is 
A^'r ~ Poisson (i/L), and given Nr, Sri/L,. . . ,Sr,Nr/L are independent and 
identically distributed with density g{-). By properties of Poisson processes, 
Nr/L = 0{i') almost surely, as L ^ oo, that is, the number of units (crypts) 
on a unit length tends to a constant. It is worth noting that Hall, Fisher and 
Hoffmann [7] required this ratio to go to infinity. We require less samples on 
the domain than do Hall, Fisher and Hoffmann [7]. 

In what follows, we provide a list of definitions and conditions. 

1. We assume that g{-) is continuous and ci > (7(t) > C2 > for all t S [0, !]■ 
Suppose ti, i = 1,2,3,4, are independent random variables with density 
5(-), define /i, /2 and /s to be the densities for ti — t2, (ti — ^25*3 — ^2) and 
(ti — 12, ts — t4, t2 — ^4), respectively. Since g{-) is bounded, one can easily 
derive that /i(0), /2(0,0) and /3(0,0,0) are positive. We also assume 
that /i and /2 are Lipschitz continuous in the neighborhood of 0, that is, 
|/i(u) — /i(0)| < Aj||u||, for Vu and some fixed constants Aj > 0, i = 1,2. 

2. Assume V(xi,X2,A) has two bounded continuous partial derivatives in 
A, and that sup^^ .^.^ / |V(xi, X2, A)| dA < 00. 



M{xi,X2,X3,X4,U,V,w) 

= Er[{eriAxi) - ^'r(xi)}{eri2(x2) - (2^2) }{0ri3 (^^s) " *r(x3)} 
X {Qrui^i) - 'i>rix4)}\Ar{il,i2) = U, 



- V{XI,X2,U)V{X3,X4,V). 

We assume M has bounded partial derivatives in u, v and w, and 



4. Denote br{xi,X2,A) = L~^ErEk^iKh{^ - Ar{i,k)}{Yr{Sri,xi) - 
^r{xi)}{Yr{Srk,X2) — ^r(a^2)}- We assume that, for any fixed A, for some 
ry>0, 

sup ^(1 var-i/2{6,(xi, X2, A)}[5,(xi, X2, A) - £;{6,(xi, X2, A)}]|2+^) 



on [0,1]. 



3. Let 



Arils, h) = V,Ar{i2,U) = w] 



(8) 




(9) 



< Cr, < 00. 
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5. Let J^{T) be the cr-algebra generated by {0(s,x),s S T, x S X}, for any 
Borel set T cT. Assume that the random field satisfies the mixing con- 
dition 

a(r) = sup[|P(^i n A2) - P{Ai)P{A2)\ : Ai G ^{[0, t]}, 
t 

(10) A2eJ^{[t + T,oo)}] 

= 0(r~^) for some 6>0. 

6. The kernel function is a symmetric, continuous probability density 
function supported on [—1,1]. Define aj^ = Ju^K{u)du and Rk = 
jK\v)dv. 

7. Assume that m and R are fixed numbers, L 00, /i — > 0, Lh — 00 and 
Lh^ = 0{l). 

In assumption 1, we are imposing some regularity conditions on g and fi. 
In fact, when g is differentiable fi are piecewise differentiable, but usually 
not differentiable at 0. However, the Lipschitz conditions on fi and /2 are 
easily satisfied when, for example, g is Lipschitz continuous on [0, 1]. 

Since we are estimating the covariance function, which is the second mo- 
ment function, we need a regularity condition on the fourth moment func- 
tion as in (8). Condition (9) may seem strong at the first sight, but it is 
simply a condition that bounds the tail probability of our statistics. For 
example, if we have an assumption analogous to (8) for the eighth mo- 
ment of Qr{s,x), we can use arguments as in Lemma A. 3 to show that 
E{[br{xi,X2, ^) — E{br{xi,X2, ^)}]'^) = 0{L~^h~^), and therefore condition 
(9) is satisfied for 77 = 2. In general, when the distribution of O is neither too 
skewed nor has a much heavier tail than that of the Gaussian, equation (9) 
will be satisfied. Assumption 6 and 7 are standard in the literature of kernel 
estimators. 

Denote V(°'°'2)(xi,X2, A) = 52V(a;i, 2:2, A)/5A2. Let V(A)^V(A) andV(A) 
denote the vectors collecting V{xi,X2,A.), V(xi,a;2, A) and V(3;i,X2,A), re- 
spectively, for all distinct pairs of {xi,X2)- The following are our main theo- 
retical results. All proofs are provided in the Appendix. Note that Theorem 
1 refers to V(-) in (4), while Theorem 2 refers to V(-) in (5). 



Theorem 1. Under assumptions 1-7, for A 7^ A', we have 
{RLhf/^ 

L l^V^ 



V(A)-V(A)-bias{V(A)} 
V(A')-V(A')-bias{V(A')} 

S(A) 



Normal 



C^(A,A') 



C(A,A') 
S(A') 
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where the asymptotic bias bias{V(A)} is a vector having entries bias{V(xi, rE2, 
^)} = o"f^V^*'''^'^-*(xi,a;2, A)/i^/2, S(A) is the covariance matrix with the en- 
try corresponding to cov{V(xi, X2, A), V(a;3, X4, A)} equal to Rk{M{xi,X2, 

2;3,X4, A, A,0) + /(X2 = X4)CJ^V(X1,X3, 0)+/(3;i = X3)(J^V(X2,2;4,0)+I(X1 = 

X3,X2 = Xi)a^} + I(A = 0)i2_ft'{7W(xi,X2, 2:3, X4, 0,0,0) + I{xi = Xi)alV{x2, 
x^,Qi) + I{x2 = 2;3)(T^V(xi, 0:4, 0) + = X4, X2 = x^)a'^} and C(A, A') is the 
matrix with the entry corresponding to cov{V(xi, X2, A), V(x3, X4, A')} equal 
to I{A' = -A){M{xi,X2,X3,X4,A,-A,-A) + I{x2 = X3)a'^V{xi,X4,0) + 

I{xi = X4) X a'^V{x2,X3,0) +I{xi = X4,X2 = X3)a^}. 



Theorem 2. Under assumptions 1-7, for A 7^ ±A', we have 



(RLh) 



1/2 



V(A) 
V(AO - 



-V(A) 
V(A') 



Normal 



0,{/>Vi(0)} 



bias{V(A)} 
-bias{V(A')} 

\ 





O(A') 



w/iere bias{V(A)} is a vector with entries bias{V(2;i, X2, A)} = a'j^V^^'^''^\xi, 
X2,A)/i^/2, r2(A) is the covariance matrix with the entry corresponding to 
cov{V(3;i,X2, A), V(x3,a;4, A)} equal to (l/2)i?i^{A^(2;i,3;2,a;3,X4, A, A,0) + 

M{xi,X2,X3, X4, A, - A, -A) +/(X2 = X4)a'^V{xi,X3,0) + /(xi = X3)cT^V(x2, 
X4,0) +/(xi = X3,X2 = X4)(Tg +/(X2 = X3)<T^ V(xi , X4, 0) + /(xi = X4)c7gV(x2, 
X3, 0) + /(X1 = X4, X2 = X3)(J^} + /(A = 0){1/2)Rk{2M{xi , X2, X3, X4, 0, 0, 0) + 
/(X2 = X4)(T^V(xi,X3,0) + /(xi = X3)fT^V(x2,X4,0) +/(xi =X3,X2 =X4)(T^ + 
/(X2 = X3)fj2V(xi,X4,0) +/(xi =X4)(t2V(x2,X3,0) + /(xi =X4,X2 = X3)af}. 



Corollary 1. Suppose the covariance function has the separable struc- 
ture in (3) with J2xi 'l2x2<xi G{xi,X2) / and p{A) defined in (7). Then for 
A 7^ O, we have 

{RLhf/^[p{A) - p{A) - bias{p(A)}] ^ Normal[0, {v'' fM]-^ a%A)\, 

where bias{p(A)} = {^(^^(A) - p{A)p'^'^\Q)}a\h^ /2 is the asymptotic bias of 
p{A) and al{A) = {E., E.,<x, G(xi, X2)}-2{l^f)(A)l + p2(A)l^f](0)l}. 

Remark. The measurement errors in (1) affect the covariance estima- 
tor mainly though the nugget effect [3]. In our covariance estimators (4) 
and (5), we eliminate the nugget effect by excluding the k = i terms in the 
summation. As a result, the measurement errors do not introduce bias to 
our covariance estimators. However, they do affect the variation of the co- 
variance estimators and hence the correlation estimator, as seen by the fact 
that cjg is in the variance expressions for all our estimators. 
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4. Data analysis. In this section we apply our methods to study the 
between-crypt dependence in the carcinogenesis experiment. Recall that the 
main subjects are rats, the units of interest are colonic crypts and the sub- 
units within a unit are cells at which we observe the logarithms of p27 
in a cell. The subunit locations that we work with in this illustration are 
at x = 0, 0.1, 0.2, . . . , 1.0. We discuss three key issues in our analysis, namely 
bandwidth selection, standard error estimation and positive semidefinite ad- 
justment, in the following three subsections. 

4.1. Bandwidth selection. 

4.1.1. Global bandwidth. Diggle and Verbyla [5] suggested a cross-validation 
procedure to choose the bandwidth for a kernel variogram estimator. We 
modify their procedure into the following two types of "leave-one-subject- 
out" cross-validation criteria. The first is based on prediction error without 
assuming any specific covariance structure and is given as 

m m 

(11) CVi{h)=Y^ 'Y'Y[Vr,ik{Xj^^l)-^{-r){xj,Xl,Ar{i,k)}f, 
r |Ar(i,fc)|<A(, j=l /=1 

where Vr,ik{xj,xi) = {Yrij - Yr.j)iYrki - Yr.i), V(_r)(xi,X2, A) is the kernel 
covariance estimator using bandwidth h, as defined in (5), with all informa- 
tion on the rth subject (rat) left out. Here we focus on the range |Ar(i, k) \ < 
Aq, where Aq is a prechosen cut-off point. The criterion CVi{h) thus evalu- 
ates the prediction error for different h within the range of |Ar(z, k) \ < Aq. 

The cross-validation criterion (11) assumes no specific covariance struc- 
ture, while our second cross-validation criterion takes into account the sep- 
arable structure in (3) and is given as 

m m 

r |A,(i,fc)|<Aoj=U=l 

(12) 

- G(_r){Xj,Xi)p(^_^){Ar{l,k)]] , 

where G(^_y.-){xi,X2) and p(„r)(A) are the estimators of G and p defined in 
(6) and (7), with the rth subject (rat) left out. 

We evaluated both criteria to estimate the bandwidth h. We chose Aq = 
500 microns. The first two columns of Table 1 give the minimum points and 
minimum values of the two cross-validation criteria. 

By observing Table 1, we find the two criteria gave almost identical min- 
imum values. Since the cross-validation scores are estimates of the predic- 
tion errors, the two cross-validation criteria represent prediction errors with 
and without the separable structure (3). The phenomenon that CVi(-) and 
CV2{-) have almost the same minimum values suggests that the separability 
assumption (3) fits the data well. 
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Table 1 




Outcomes 


of two cross- 


validation procedures on 


the carcinogenesis p27 






data 






Optimal h 


Min CV score 


Min score, 2 par 


CVi 


124.2334 


6.5073 


6.4867 


CV2 


122.7202 


6.4955 


6.4788 



The data used in the vahdation are those with A values less than Ao = 
500 microns. The first column gives the optimal global bandwidth, the 
second column gives the value of the cross-validation function at the 
optimal global bandwidth and the third column gives the minimum 
value of the cross-validation functions using two different smoothing 
parameters. 




Ll_ 

o _ 



s - 

O J I 1 1 1 1 1 1 1 1 1 

200 400 600 80O ioOO 

Fig. 1. Histogram of \Ar{i,k)\ m the carcinogenesis p27 data. |A| less than 1000 microns 
are considered. 

4.1.2. Two bandwidths. The independent variables in the kernel estima- 
tor are |Ar(i,A;)| for all pairs of crypts within one subject. As shown in 
Figure 1, the distribution of |Aj.(i, k)\ that are less than 1000 microns, even 
more than the target range of interest, is locally somewhat akin to a uniform 
distribution. 

As a robustness check on the global bandwidth, we repeated our analysis, 
except we used one bandwidth for |A| < 200 microns, and we used a second 
bandwidth for |A| > 200, and then repeated the cross-validation calculations 
in (11) and (12). The minimum values of the two cross-validation criteria 
are reported in the third column of Table 1. 
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Fig. 2. Estimated correlation function for the carcinogenesis p27 data. The solid curve 
is p(A), the dotted curves are p(A) ± 2SD{p{A)} , and the dashed curve is the positive 
semidefinite adjusted estimate, p(A). 



Comparing the results in columns 2 and 3 in Table 1 , we find the minimum 
values of the cross-validation functions did not change much, that is, an 
extra smoothing parameter did not substantially reduce the prediction error 
for the domain |A| < 500 microns. In other words, it appears sufficient to 
use a global bandwidth to estimate /5(A) for |A| < 500. For the following 
analysis, we use the bandwidth h = 122 microns, as suggested by CV2, and 
the resulting estimate p is shown as the solid curve in Figure 2. 

4.2. Standard error estimation. Our primary goal in this section is to 
construct a standard error estimate for /o(A). 

The asymptotic variance of p(A) has a very complicated form, which in- 
volves the fourth moment function of the random field, 7V4(3;i, X2, a^s, X4, n, v, w). 
With so many estimates of higher-order moments involved, a plug-in method, 
while feasible, is not desirable. We instead use a bootstrap method to esti- 
mate the variance directly. 

In our model assumptions, the number of subjects (rats) R is fixed, which 
means that bootstrapping solely on the subject level will not give a consistent 
estimator of the variance. Consequently, we decided to subsample within 
each subject. When the data are dependent, block bootstrap methods have 
been investigated and used, see Shao and Tu [15]). Politis and Sherman [12] 
also justified using a block sub-sampling method to estimate the variance 
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of a statistic when the data are from a marked point process. Our data can 
be viewed as a marked inhomogeneous Poisson process. However, because of 
the inhomogeneity, we need to modify their procedure: when we subsample 
a block from each subject and compute the statistic yo(A) by combining 
these blocks, the variance of the statistic depends on the corresponding local 
intensity at the location where each block is sampled. 

By letting i? = 1 in Corollary 1, our theory implies that if the number 
of units goes to infinity, each subject will provide a consistent estimator of 
p(A). Now, suppose the Poisson process for each subject has a different local 
intensity, Vrdris), r = 1, . . . ,R. With a slight modification of our theoretical 
derivations, one can show that 

r R . 1/2 

|E ^'/r,i(0)L/i| [p(A) - p(A) - bias{p(A)}] ^ Normal{0, a^(A)}, 

where fr,i(t) = J gr{t + u)gr{u) du, r = 1, . . . , R, are the counterparts of fi{t) 
used in Theorem 1, 2 and Corollary 1. 

Define A{A) = J2r Ei Efc^i ^h{A^(z, k) - A}. Then by Lemma A.2, 



in L^. 



Note that A{A) here is defined slightly different from a(A) in Lemma A.2. 

This equation suggests a natural way to construct correction weights in a 
weighted block bootstrap procedure that accommodates the inhomogeneous 
local intensity. We now propose our weighted bootstrap procedure: 

1. Resample R subjects (rats) with replacement. 

2. Within each resampled subject, randomly subsample a block with length 
L*. 

3. Combine the R blocks as our resampled data, compute p(A) and ^(A) 
using the resampled data, with the same bandwidth h as for the kernel 
estimator (7). 

4. Repeat steps 1-3 B times, denoting the results from the bth iteration as 
p;(A)and AHA). 

5. Obtain the estimator of the standard deviation as 

_ r B ]i/2 

sd{p(A)}= ^-i(A)i?-i^A^(A){p|(A)-p!(A)}2 , 

6=1 

where p!(A) = i3-iEf=iP^(A). 

The block length L* should increase slowly with L. Politis and Sherman [12] 
proposed a block size selection procedure for dependent data on irregularly- 
spaced observation points which are from a homogeneous point process. 
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Their procedure is built on asymptotic theory and needs a good pilot block 
size. The good performance of the procedure often requires a fairly large 
sample size. The implementation could be computationally intense. 

One operational idea for a moderate sample size in our context is to choose 
L* such that the correlation dies out outside the block but there are still 
a relatively large number of blocks. For this data set, we adopted a block 
size such that there are at least a couple of nonoverlapping blocks within 
each subject, and there are totally 24 nonoverlapping blocks by pooling all 
subjects together. In our analysis we took L* = 1 cm (=10,000 microns). 
We also tried L* = 8 mm and L* = 1.1 cm; the results are very similar. We 
investigate the numerical performance of this simple procedure in both of 
the two simulation studies in Section 5, and we find it works pretty well. 

The two dotted curves in Figure 2 show p±2 standard deviation. The 
plot implies that the correlation is practically zero when the crypt distance 
exceeds 500 microns. 

4.3. Positive semidefinite adjustment. By definition, p{A) is a stationary 
correlation function and therefore is positive semidefinite, that is, / / />( Ai — 
A2)a;(Ai) x u;(A2) dAi dA2 > for all integrable functions a;(-). By Bochner's 
theorem, the positive semidefiniteness is equivalent to nonnegativity of the 
Fourier transformation of p, that is, p~^{9) > for all 6, where p~^{9) = 
p{A) expiiOA) dA = 2 p{A) cos(0A) dA. 

To make p a valid correlation function, we apply an adjustment procedure 
suggested by Hall and Patil [8] . First we compute the Fourier transformation 
of?(-), 

P^{e) = 2 / p{A)cos{eA)dA. 
Jo 

In practice, we cannot accurately estimate p(A) for a large A because of 
data constraints. So, what we should do is multiply p by a weight function 
u;(A) < 1 and let 

/•CO 

p+{e) = 2 p{A)w{A)cos{eA)dA. 
Jo 

Possible choices of w{-) suggested by Hall, Fisher and Hoffmann [7] are 
wi{A) = I{\A\ < D) for some threshold value -D > 0; and W2(A) = 1 if |A| < 
Di, {D2 - \A\)/{D2 - Di) if Di < \A\ < D2 and if |A| > D2. 

The next step is to make p^ nonnegative and then take an inverse Fourier 
transformation. So, the adjusted estimator is defined as 

p(A) = (27r)"^y" max{p+(^),0}cos(M)d^. 

The adjusted estimate of the correlation function for the colon carcinogenesis 
p27 data is given as the dashed curve in Figure 2. 
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5. Simulation studies. We present three simulation studies to illustrate 
the numerical performance of the kernel correlation estimator under different 
settings. 

5.1. Simulation 1. Our first simulation study is to mimic the colon car- 
cinogenesis data, so that the result can be inferred to evaluate the perfor- 
mance of our estimators in the data analysis and to justify our choice of 
tuning parameters. 

The simulated data arise from the model 

Yj, (Sr-j 5 ) — ©f. (Sri ^ ■^j) ~l~ ^rij ; 

where Q*{s,x) is the rth replicate of a zero-mean Gaussian random field 
Q*{s,x), r = 1, . . . , 12. As in our data analysis, x takes values in {0.0,0.1, . . . , 
0.9, 1.0}. We used the actual unit (crypt) locations from the data as the sam- 
ple locations Sri in the simulated data. In addition, @*(s,x) has covariance 
structure (2) and (3), with 

/ 12 \ -1 12 Nr 

G*ixi,X2)=iY.Nr] Y.J2{Y"ixi)-Yr.ix^)}{Yri{x2)-Yr.ix2)}, 
\r=l / r=l i=l 

(13) 

which is computed from the data, and /0*(A) chosen from the Matern correla- 
tion family p*{A;cj),K) = {2''-^r{K)}-^{A/(j))''K^{A/(j)), where K^{-) is the 
modified Bessel function; see Stein [16]. In our simulation we chose k = 1.5 
and (j) = 120 microns. In addition, the e*jj are independent identically dis- 
tributed as Normal(0, cr^. ). For cr^. we use an estimate of cr^ from the data, 
^e* = TrS]ii{G'*(xj,Xj) - G{xj,Xj)}, where Xj = (j - 1)/10, j = 1, . . . , 11, 
and G* and G are defined in (13) and (6), respectively. 

For each simulated data set, we computed p{A) and the standard devia- 
tion estimator SD{p(A)} that we proposed in Section 4.2, for bandwidths 
h = 120 and 200 microns. When doing the bootstrap, we used block size 
L* = 1 cm, as we did in the p27 data analysis. We repeated the simulation 
200 times. 

Figure 3 shows the means and 5% and 95% pointwise percentiles of p 
for the two bandwidths, and compares them to the truth p* . Obviously, as 
expected from the theory, the larger bandwidth incurs the bigger bias. By the 
plots, it seems that when h = 120 the kernel estimator p behaves quite well. 
We compare the true bias from the simulation study to the asymptotic bias 
computed with the true correlation function p* , under bandwidth h = 120. 
We find the differences between the two are less than 0.04. This means the 
bias shown in Figure 3 is explainable by our asymptotic theory. 

In Figure 4, we show the pointwise standard deviation of p from the 
simulation and the mean of the bootstrap standard deviation estimates. 
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Fig. 3. Plots of the correlation estimators in Simulation 1. Upper panel: h= 120; lower 
panel: h = 200. In each plot, the solid curve is the true correlation function p(-), the dashed 
curve is the mean of^{-), the dotted curve is the mean of p{-), and the dot-dash curves are 
the 5% and 95% pointwise percentiles of 'p. h = 200 oversmooths the curve, hence incurs 
larger bias. 
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Fig. 4. Standard deviation ofp in Simulation 1. The solid curve is the pointwise standard 
deviation ofp from the simulation in Section 5.1, and the dashed curve is the mean of the 
200 bootstrap standard deviation estimates. The bandwidth h — 120 was used. 

The closeness of the two curves impHes that our bootstrap procedure in 
Section 4.2 gives an approximately unbiased estimator of the true standard 
deviation, which also implies that our choice of block length, L* = 1 cm, is 
reasonable. In our simulation, we also tried L* = 8 mm and L* = 1.1 cm; the 
results are very similar. 

We applied the positive semidefinite adjustment procedure in Section 4.3 
to the simulation, and the pointwise mean of p{-) is also shown in Figure 
3. We computed the integrated mean squared errors (IMSE) of p and p up 
to A = 500, and found that IMSE(p) = 12.56 whereas IMSE(p) = 8.50. This 
result agrees with the theory in Hall and Patil [8] that positive semidefi- 
nite adjustment can actually improve the integrated mean squared error of 
the raw kernel estimator. We found that most of the improvements come 
from the regions where p{-) is close to or 1, the areas where the proce- 
dure corrects the shape of p the most due to the enforcement of positive 
semidefiniteness . 

5.2. Simulation 2. As suggested by the referees, we provide a second 
simulation study to evaluate the finite sample numerical performance of our 
correlation estimator when the locations or times are from an inhomoge- 
neous Poisson process as assumed in Section 3. Also, we choose a correlation 
function which is similar in shape to that obtained from the p27 data ex- 
ample, but is even less monotone; this clearly illustrates a situation that an 
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"off-the-shelf" parametric model fails to fit the data. The true correlation 
function is given by the solid curve in the middle panel in Figure 5, while 
the corresponding spectral density is given in the upper panel of the same 
figure. 

We kept the same simulation setup as those in Simulation 1, except that 
the spatial correlation p{A) was set to be the one given in Figure 5, the 
locations were sampled from an inhomogeneous Poisson process as given 
in Section 3 with g a truncated normal density function on [0, 1] , and we 
simulated only one subject on a prolonged domain [0,L], with L = 50,000 
and the expected number of units equal to 500. We let the bandwidth h = 35 
and block size L* = 6000 for the weighted block bootstrap procedure. 

We repeated the process described above 200 times, and computed the 
proposed correlation estimator for each simulated data set. In the middle 
panel of Figure 5, the mean of our kernel correlation estimator is given 
by the dashed curve, while the dotted curve is the best approximation to 
the true correlation function from the Matern family. As one can see, our 
nonparametric method can consistently estimate a nonmonotone correlation 
function. 

In Figure 5 we also compare the mean of our bootstrap standard deviation 
estimator with the true pointwise standard deviation curve. We found that 
the proposed standard deviation estimator also works quite well given the 
finite sample size. 

5.3. Simulation 3. Our third simulation study has the same setting as 
Simulation 2, except that the correlation function is replaced by 



as suggested by one referee. This correlation is given by the solid curve 
in Figure 6. We use this simulation to illustrate the performance of the 
kernel correlation estimator and its adjusted version in the case that the 
true correlation function is not smooth at 0. 

Because this function decays to with a slower rate, we present the es- 
timate up to A = 1000. The dashed curve in Figure 6 gives the mean of p 
over 200 simulations, the two dotted curves give the pointwise 5% and 95% 
percentiles of p and the dot-dash curve gives the mean of p. One can see that 
p still behaves well even though the true function is not differentiable at 0. 
The adjustment procedure introduced some bias, but reduced the variation. 
We compared the IMSE of the two estimators over [0,50] and [0,500]: the 
IMSE values for p over the two ranges are 0.40 and 6.59, while the corre- 
sponding IMSE values for p are 0.19 and 4.53. The adjustment procedure 
improved the IMSE on both ranges, even in an area close to the origin. 

This simulation study shows that our estimators work even when the 
differentiability assumption in Section 3 is mildly violated. 
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Fig. 5. Simulation 2. Upper panel: the spectral density of the correlation used in the 
simulation; middle panel: the solid curve is the true correlation function, the dashed curve 
is the mean of the kernel correlation estimator and the dotted curve is the best Matern 
approximation to the true correlation; lower panel: the solid curve is the true pointwise 
standard deviation for the kernel correlation estimator, the dashed curve is the mean for 
the bootstrap standard deviation estimator. 



SPATIAL CORRELATION FUNCTIONS 



19 




6. Discussion. We have proposed an estimator of stationary correlation 
functions for longitudinal or spatial data where within-subject observations 
have a complex data structure. The application we presented has a functional 
data flavor, in that each unit (crypt) in a "time series" has sub units (cells), 
the values from which can be viewed as a function. However, in this paper, 
we have focused on estimating the spatial correlation between the units. 

We established an asymptotic normal limit distribution for the proposed 
estimator. The techniques used in our theoretical derivation were signifi- 
cantly different from those of the standard kernel regression literature. In 
our theoretical framework, as long as we have an increasing number of ob- 
servations within a subject, each subject yields a consistent estimate of the 
correlation function. Our method and theory are especially useful to the 
cases where the number of subjects is limited but we have a relatively large 
number of repeated measurements within each subject. Since having more 
subjects will just further reduce the variation of the estimator, our main 
theorems hold when R goes to infinity as well. In that case, we need to 
replace the condition that Lh^ = 0(1) in assumption 7 in Section 3 with 
RLh^ = 0(1). In fact, when the number of subjects R — > oo, we can con- 
sistently estimate the within-subject covariance without a large number of 
units within each subject. For example, Yao, Miiller and Wang [18] proposed 
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using smoothing methods to estimate within-subject covariance for sparse 
longitudinal data. 

In spatial statistics, many authors have considered the setup under the 
intrinsic stationary assumption (Besag, York and Mollie [2], Besag and Hig- 
don [1]). This is weaker than our second-order stationary assumption. In our 
case, each unit within a subject has further structure, so that we can define a 
cross- variogram (Cressie [3]) instead of the covariance function V(xi,X2, A), 
and similar limiting distribution theorems can be proved as in Theorem 
1 and 2. However, when it comes to spatio-temporal modeling, many au- 
thors (Cressie and Huang [4], Stein [17]) would still focus their attention 
on covariance estimation because it is a more natural way to introduce the 
separable structure (3). In our data analysis, we provided some practical 
ideas to justify the separable structure in our data, where we compare the 
cross-validation scores with and without the separable assumption. 

We proposed a weighted bootstrap method to estimate the standard de- 
viation of the correlation estimator where the weights were constructed 
based on the outcome from Lemma A. 2 in the proofs. Our simulation studies 
show that the proposed correlation estimator and the weighted bootstrap 
standard deviation estimator work well numerically for finite sample sizes. 

APPENDIX: PROOFS 

The proofs are organized in the following way: in Section A.l, we provide 
lemmas regarding asymptotic properties of the covariance estimators when 
there is only one subject; in Section A. 2, we provide lemmas on the estima- 
tors with multiple subjects, and the proofs of Theorems 1, 2 and Corollary 
1 are given at the end. 

A.l. Estimation within one subject. We first discuss the case where 
there is only one subject and the number of units goes to infinity. Let iV(-) be 
the inhomogeneous Poisson process on [0,-L] with local intensity ug*{s). As 
in Karr [10], denote N2{dsi,ds2) = N{dsi)N{ds2)I{si /S2). Let 0(s,-) de- 
note the unit-level mean at unit location s, and ^'(•) denote the subject-level 
mean. Define 

= L-i Kh{A-{si-S2)}N2{dsi,dS2y, 

Jo Jo 

6(xi,rc2, A) = L-i^^ i^^(A - \k){Y{Si,xi) - ^{xi)}{Y{Sk,X2) - ^{x2)} 

= L-i r ["^ Kh{A-isi-S2)}{Y{suXi)-^{xi)} 
Jo Jo 
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X {Y{S2,X2) - '^{x2)}N2{dSi,dS2). 

Lemma A.l. Let Xi and X2 be real-valued random variables measurable 
with respect to J-'{[0,t]} and J^{[t + t, 00)} respectively, such that \Xi\ < Ci, 
i = 1,2. Then \ cov{Xi,X2)\ < 4CiC2a(r). IfXi and X2 are complex random 
variables, this inequality holds with the constant 4 replaced by 16. 

Proof. The proof is analogous to that of Theorem 17.2.1 in Ibragimov 
and Linnik [9]. Denote Ti = [0,t], T2 = + r, 00). Then we have 

\E{XiX2) - E{Xi)E{X2)\ 

= \E[E{XiX2\r{Ti)}] - E{Xi)E{X2)\ 

= \E{Xi[E{X2\J^{Ti)} - EiX2)])\ 

<CiE\E{X2\nTi)}-E{X2)\ 

= CiE{ui[E{X2\T{n)}-EiX2)]), 

where ui = sign[E{X2\J^{Ti)} — £^(^2)]. It is easy to see that ui is mea- 
surable with respect to J'iTi), and therefore \E{XiX2) - E{Xi)E{X2)\ < 
Ci\E{uiX2) — E{ui)E{X2)\. By the same argument, we have \E{uiX2) — 
E{ui)E{X2) \ <C2\E{uiU2) - Eiui)E{u2)\, where = sign[^{'Ui [.^(Ta)} - 
E(ni)].Nowwehave \E{XiX2) - EiXi)E{X2)\ < CiC2\E{uiU2) - E{ui)E{u2)\. 
Define the events Ai = {ui = 1} G J'iTi), Ai = {ui = -1} G :r(ri), A2 = 
{U2 = 1} G T{T2) and A2 = {u2 = -1} G ^(Ts). Then 

\E{uiU2) - E{ui)E{u2)\ 

= \P{A,A2) - P{AiA2) - P(AiA2) + P(A{A2) 

- P{Ai)P{A2) + P{Ai)P(A2) + P(Ai)P{A2) - P(Ai)P(A2)\ 
< \P{AiA2) - P{Ai)P{A2)\ + \P{AiA2) - PiAi)P(A2)\ 

+ \P(A,A2) - P(Ai)P{A2)\ + \P(A{A2) - P(Ai)P(A2)\ 
<4a(r). 

Thus, the proof is completed for the real random variable case. If Xi and X2 
are complex, we can apply the same arguments to the real and imaginary 
parts separately. □ 

Lemma A. 2. With the assumptions stated in Section 3, for any fixed A, 
we have a(A) — > i/^/i(0) in the L? sense, as L ^ 00. 

Proof. Recall that by definition of /i(-), if Xi and X2 are independent 
and identically distributed with density g{-), then fi{u) = J g{t + u)g{t)dt 



22 Y. LI ET AL. 

is the density of Xi — X2. Thus, for fixed A, 

E{a{/\)} = u'^L-^ [ [ Kh{A - (si - S2)}g{si/ L)g{s2/ L) dsi ds2 



= u'^l[ I Kh{A-L{h-t2)}g{ti)g{t2)dtidt2 
Jo Jo 

= i/2Ly" j Kh{A- Lu)g{t2 + u)g{t2)dudt2 

= v'^L j Kh{A- Lu)f\{u) du = u^ J K{v)fi{{A-hv)/L}dv 

= u^ j K{v){h{Q) + O(L-i)} dv = i/Vi(0) + O(L-i). 

Next, 

E{a\A)} = L-^ r Kh{A-[si-S2)]Kh{A-[s-i-Si)} 
Jo Jo Jo Jo 

X E{N2{dsi,ds2)N2{ds^,dsi)}. 
Calculations as in Guan, Sherman and Calvin [6] show that 
E{N2{dsi,ds2)N2{ds:i,dsi)} 

= v'^g*{si)g*{s2)g*{s:i)g*{s4) dsi ds2 dss ds^ 

S3 



+ g* {si)g* {s2)g* {sA)£sMs-i) dsi ds2 ds4 
+ i^^g*{si)g*{s2)g*is3)esj (ds^) dsi ds2 dss 



+ i^^g*{si)g*is2)g*is4,)es2ids-i) dsi ds2 ds4, 
+ i^^g*{si)g*{s2)g*is3)es2{ds4,) dsi ds2 dss 
+ u'^g* {si)g* {s2)£si {ds3)es2 (ds^) dsi ds2 
+ i''^g*{si)g*{s2)es^{dsi)es2{ds3) dsi ds2, 

where Sx{-) is a point measure defined in Karr [10], such that ex{dy) = 1 if 
X G dy, otherwise. Here dy is defined to be a small disc centered at y. There 
are seven terms in the expression above, so the expression for £'{a^(A)} can 
be decomposed into seven integrals; denote them as An- An. Similar to the 
calculations of -E{a(A)}, we have 

Aii = v^L-^ j j [[ Kh{A-{si-S2)}Kh{A-{s3-Si)} 

J J Siy^S2 J J S37^S4 

X g{si/L)g{s2/L)g{s3/L)g{s4/L) dsi ds2 dss ds4 

= u^fUO) + o{l), 
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Ai2 = ly^L-'^ I Kh{A - {si - S2)}Kh{^ - {si - sa)} 

X g{si/ L)g{s2/ L)g{si/ L) dsi ds2 ds^ 
= u^L J J Kh{^- Lui)Kh{A- L{ui-U2)}f2iui,U2)duidu2 

(by definition of /2) 
= i/^L-i J J K{vi)K{v2)f2{{^ - hvi)/L, {V2 - vi)h/L} dvi dv2 
= i/3l-1/2(0,0) + O(L-2). 
Similarly, ^13-^15 are of order 0(L~^). Next, 

Aie = i^^L"^ I Kl{A - (si - S2)}gisi/L)g{s2/L) dsi ds2 
= v'^ j Kl{A- Lu)fi{u) du 
= u^L~^h~^ J K^{v)fi{{A-hv)/L}dv 

Similarly, we can show that An is of the same order as Aiq. This means 
that ^11 is the leading term in E{a^{A)}. Hence, E{a{A) - f'^fiiO)}'^ 0, 
completing the proof. □ 

Lemma A. 3. For any fixed A, define X2, A) = b{xi,X2,A) — a{A) x 
V{xi,X2,A). Then 

i?{/3(rEi , X2 , A) } = (0) { V(°'°'2) (xi , X2 , A /i 2/2 + o(/i' ) } , 

COv{P{xi,X2,A),f3{x3,X4, A')} 

= u^L~^h~^RKfm 

X [/(A = A'){A4(xi,X2,a:3,X4,A,A,0) 

+ /(x2 = X4)a1V{xi,X3,Q) +I{xi = X3)a1V{x2.,X4^,{)) 

+ I{xi =X3,X2 = X4)cr^} 

+ /(A = -A'){M{xi,X2, X3, X4, A, - A, -A) 
+ I{x2 = X3)a'^V{xi,Xi,0) 
+ I{xi = XA)a'^V{x2,X3,0) 

+ I{xi = Xi,X2= X3)a^}] 

+ o(L-i/i-i), 
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where V(°'°'2)(xi, X2, A) = S^VCxi, xa, A)/aA2. 

Proof. Rewrite 

/?(xi,a;2, A) 

= L-^| j Kh{^-{si-S2)} 

X [{Y{si,xi)-^{xi)} 

X {Y{S2,X2) - ^{X2)} - V{xi,X2,^)]N2{dsi,ds2). 

It follows that 

E{/?(X1,X2,A)} 

= i.2L-i / / Kh{^-{si-S2)} 

J J S\^S2 

X {V{xi,X2,Si - S2) - V(xi,X2, A)} 
X g{si/L)g{s2/L) dsi ds2 

= v^L J Kh{A - Lu){V{xi,X2,Lu) - V{xi,X2, A)}fi{u) du 

= u^J K{v){-V^^'^'^\xi,X2,A)hv 

+ V(°'°'2)(xi, X2, A)/i2^;2/2 + o(/i2)}{/i(0) + O(L-i)} dv 
= ^'{/i(0)V(°'°'2)(xi,X2,A)ai/iV2 + o(/i2)}. 
In addition, 
cov{/9(xi, X2, A), /3(x3, 3:4, A')} 

= ^-2! 1 1 lKh{A-isi-S2)}Kh{A'-is3-s^)} 

X [V(xi,X2, A)V(x3,a;4, A') 

- V(xi, X2, Si - S2)V(X3, 3:4, A') 

- V(xi, X2, A) V(X3, X4, S3 - S4) 

+ 7\/[{xi,X2,a;3,a;4, (si - S2), (S3 - S4), (s2 - S4)} 

+ V(xi, X2, Si - S2)V(X3, X4, S3 - S4) 
+ /(Si = S3)/(S2 / S4)/(xi = 2:3) 

X a'^V{x2,X4, (S2 - S4)} 
+ /(Si = S4)/(S2 / S3)/(xi = X4) 
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X CTg V{X2,X3, (S2 - S3)} 
+ I{S2 = S3)I{si / S4)/(X2 = X3) 

X Cr^V{xi,X4,(si -S4)} 
+ 1(^2 = S4)Iisi / 53)-^ (2:^2 = 2:4) 

X a^V{xi,X3,{si - S3)} 

+ I{si = S3,S2 = S4) 

X {/(x2 = a:;4)o-^V(xi, 0:3,0) 

+ = X3)CJ^V(X2, X4, 0) 

+ =X3,X2 = X4)cr£} 

+ I{si = S4,S2 = S3) 

X {I{X2 = X3)a'^V{xi,X4,0) 

+ = X4)a'^V{x2,X3, 0) 

+ =X4,X2 = X3)cr^}] 

X E{N2{dsi,ds2)N2{ds3,dsA)} 

-^'L-^J J J jKh{^-{si-S2)}KH{A'-{s3-s,)} 

X {V{xi,X2,si - S2) - V(2;i,X2, A)} 

X {V(x3, X4, S3 - S4) - V(a:3, X4, A')} 

X g{si/ L)g{s2/ L)g{s3/ L)g{si/ L) dsids2 ds3dsi. 

As in Lemma A. 2, according to the expression for E{N2{dsi,ds2)N2{ds3,ds4)}, 
we can summarize this covariance expression as the sum of seven terms, de- 
noted as ^21^^27- We have 

A21 = v^L~^ r Kh{^ - (si - S2)}A-aA' - (S3 - S4)} 

JO JO JO JO 

X M{xi,X2,Xi,X2, (Si - S2), (S3 - S4), (S2 - S4)} 

X g{si/ L)g{s2/ L)g{s3/ L)g{si/ L) dsi ds2 ds3 ds^ 
= iy^L^ r [\h{^- L{ti-t2)]Kh{^' - L{t3-U)} 

JO JO JO JO 

X M{xi,X2,Xi,X2,L{ti -t2),L{t3 -ti),L{t2 -ti)} 

X g{ti)g{t2)g{t3)g{t4) dti dt2 dt3 dt^ 
= u^L^ [ [ [ Kh{A-Lui)Kh{A' -Lu2) 
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X M{xi,X2,Xi,X2,Lui,Lu2,Lu3) 
X f3{ui,U2,U3)duidU2dU3 

= u^L~^ J J J K{vi)K{v2)M{xi,X2,xi,X2,A-hvi,A'-hv2,V3) 
X /3{(A - hvi)/L, (A' - hv2)/L,V3/L}dvidv2dv3 
<u^L~^C J M{xi,X2,xi,X2,A,A',v)dv + o{L~'^), 

where C is the upper bound for the density function f3{u,v,w) on [—1,1]^. 
By assumption 1 in Section 3 that g{-) is bounded, one can easily derive 
that C is a finite constant. The second term is 

A22 = v^L-^ J J J Kh{A - (si - S2)}Kh{A' - {si - s^)} 

X {[V{xi,X2,A) -V{xi,X2,isi -S2)}] 
X [V{X3, Xi, A') - V{x3,Xi, {si - S4)}] 
+ M{xi,X2,X3,X4,{si -S2),(S1 -S4),(S2 - Sa)} 
+ I{xi = X3)a'^V{x2,X4, {S2 - S4)}) 

X g{si/L)g{s2/L)g{s4/L) dsi ds2 ds^ 
= u^L J J J Kh{A - L{ti - t2)}Kh{A' - L{h - U)} 
X ([V(xi,X2,A) -V{xi,X2,L(ti -ta)}] 

X [V(X3, X4, A') - V{X3,X4,, L{ti - U)}] 
+ M{xi,X2,X3,Xi,L{ti -t2),L{ti -ti),L{t2 -ti)} 
+ I{xi = X3)a'^V{x2,Xi,L{t2 -ti)}) 

X g{ti)g{t2)g{t4) dti dt2 dt^ 
= ^^lJ J Kh{A + Lui)Kh{A' + Lu2) 

X [{V(xi,a;2,A) - V(a;i,a::2,--Lni)} 
X {V(x3, a;4, A') - V(a;3, X4, -LU2)} 
+ A^{xi,X2,X3,X4, -Lui, -Lu2,L{ui - U2)} 
+ I{xi = X3)a'^V{x2,X4,L{ui - U2)}] 

X f2{ui,U2) dui dU2 

= u''L~^ [ f K{vi)K{v2)[I{xi=X3)alV{x2,X4,{vi-V2)h + A'-A} 
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+ {V(xi, X2, A) - V(xi, xa, A - hvi)} 

+ {V{X3,X4, A) - V{X3,X4, A' - hv2)} 

+ M{xi,X2,X3,X4,A - hvi,A' - hv2, 



{vi-V2)h + A' -A}] 
X /2{(-A + hvi)/L, (-A + vih)/L} dvi dv2 
= u^L~^f2{0, 0){M{xi,X2,X3,X4, A, A', A' - A) 



+ I{xi = X3)a'^V{x2, X4, A' - A)} + o(L"^). 



It is easy to see that ^23^^25 have the same order as A22. Further, we have 



X {M{xi,X2,X3,Xi, (si - S2), {Sl - S2),0} 

+ [V(xi,a;2, A) - V{xi,X2, (si - S2)}] 

X [V(X3, 3:4, A') - V{X3, 3:4, (si - S2)}] 
+ {I{x2 = X4)a'^V{xi,xs,0) 

+ I{xi = X3)a^V(x2, X4, 0) + I{xi = X3, X2 = X4)a^}) 
X 9{si/L)g{s2/L) dsids2 



X {M{xi,X2,X3,Xi,L{ti - t2),L{ti - t2),0} 
+ [V(xi,X2, A) - V{xi,X2,L(ti -t2)}] 

X [V(X3, X4, A) - V{X3, X4, L{ti - t2)}] 
+ {/(X2 =X4)cr^V(xi,X3,0) 

+ I(xi = X3)CJ^V(X2, X4, 0) 



X 9{h)g{t2)dtidt2 

= I{A = A'y j Kl{A-Lu) 

X [A^(xi, X2, X3, X4, Lii, Lii, 0) 
+ {V(xi,X2, A) - V{xi,X2,Lu)} 
X {V(x3, X4, A) - V(x3, X4, Lu)} 

+ {/(X2 = X4)crfV(xi,X3,0) 





+ I{xi = X3, X2 = X4)(J£ }) 



28 Y. LI ET AL. 

+ I{XI = X-i)(jlV{x2,Xi, 0) 

+ I{xi = x3,a;2 = X4)cj^}] 

X /i(u) du 

= I{A = A')i^'^L-^h-^ J K^{v)[M{xi , X2,X3, X4, A-hv,A- hv, 0) 

+ {V{xi,X2,A) - V{xi ,X2,A- hv)} 

X {V(X3, X4, A) - V{X3,X4, A - hv)} 

+ {I{x2 = Xi)a'^V{xi,X3,0) 
+ I{xi = X3)a'^V{x2,X4, 0) 

+ I{xi = X3, X2 = XA)a^}] 

X fi{{A-hv)/L}dv 
= /(A = A')u^ L-^h-^RKf liO) 
X [M{xi,X2,X3,X4,A,A,0) 

+ {/(x2 =X4)CJ^V(X1,X3,0) + I{xi = X3)a'^V{x2,X4,0) 

+ I{xi =X3,X2 =X4)a^} + o{l)]. 

Similarly, 

A27 = u^L~^ J J Kh{A - (si - S2)}Kh{A' - {S2 - si)} 

X {M{xi,X2,X3,Xi, (Si - S2), (S2 - Si), {S2 - Si)} 
+ [V{xi,X2,A) - V{xi,X2, (Si - S2)}] 

X [V{X3,X4, A') - V{x3,Xi, (S2 - Si)}] 
+ {I{X2 =X3)CJ^V(X1,X4,0) +I(xi = Xi)a'^V{x2,X3,0) 

+ I{xi = X4, X2 = X3)a^}) 

X g{si/L)g{s2/L) dsids2 
= I{A = -A')u^J J Kl{A-L{h-t2)} 

X (A^{xi,a;2,X3, j;4,L(ti -t2),L{t2 -ti),L{t2 - h)} 

+ [V(X1,X2, A) - V{xi,X2,L{ti-t2)}] 

X [V(x3, 2:4, -A) - V{x3, X4,L{t2 -h)}] 
+ {I{x2 = X3)a'^V{xi,Xi,0) 

+ I{xi = X4)cr^V(x2, 2:3, 0) 
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+ I{xi = X4, X2 = XsjcTg }) 



X g{ti)g{t2)dtidt2 
= I{A = -A')!^^ J kI{A - Lu)[M{xi,X2,X3,X4^,Lu,-Lu,-Lu) 

+ {V{xi,X2,A) - V{xi,X2, Lu)} 

X {V{X3,X4, A) - V{X3,XA, Lu)} 
+ {/(x2 =X3)CJ^V(X1,X4,0) 

+ I{xi = Xi)a'^V{x2,X3, 0) 



= I{A = -A'yL-'h-'RKfi{0) 
X [M{xi,X2,xs,X4,,A,-A,-A) 
+ {I{x2 = a:3)cr^V(a:i,X4,0) 
+ I{xi = Xi)a'^V{x2,X3, 0) 
+ /(xi = Xi,X2= X3)a^} + o{l)]. 

Both ^426 and ai'e of order 0{{Lh)~^}, while the rest of the terms are 
of order 0{L~^). The proof is completed by summarizing the contribution 
of each term to cov{/3(xi,X2, A), /3(x3, X4, A')}. □ 

Lemma A. 4. With /3(xi,X2,A) defined as in Lemma A. 3, and with all 
the assumptions in Section 3, we have 



{Lh)'/^[(3{xuX2,A) - X2, A)}] ^ Normal{0, fi{0)a\xi,X2, A)}, 



+ I{xi = X4,X2 = X3)af}] 



X fi (u) du 




X [^A{xl,X2,X3, Xi, A — hv, —A + hv, —A + hv) 
+ {V{xi,X2,A) -V{xi,X2,A-hv)} 

X {V(X3, X4, A) - V(X3, X4, A-hv)} 
+ {I{x2 = X3)cr'^V{xi , X4, 0) 
+ = X4)cr^V(x2, X3, 0) 

+ /(xi =X4,X2 =X3)CJ^}] 

X fi{{A-hv)/L}dv 
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where a^{xi,X2, A) = Rk{M{xi,X2,xi,X2, A, A,0) + a'^V{xi,xi,0)+a'^V{x2, 
X2,0) + a^} + I{A = 0)Rk[{M{xi,X2,xi,X2,0,0,0) + I{xi = X2){2a^,V{xi, 
xuO)+at}]. 



Proof. The proof has similar structure to that of Theorem 2 in Guan, 
Sherman and Calvin [6]. Define ai = 0, bi = — L'', ai = aj_i + LP, bi = ai + 
U -L'i,i = 2,. . .,kL, for some 1/(1 + 6) <q<p<l [6 is defined in (10)]. We 
thus have divided the interval [0, L] into k^K, L/U' disjoint subintervals each 
having length — L'^ and at least L'^ apart. Define li = [ai, bi], I = Ufii h, 
I[=[a,/L,bi/L], I' = [fitil[ and 

A(xi,X2,A) = L-i / / Kh{A-{si-S2)} 

J JliXli 

X [{Y{si,Xi) - ^{XI)}{Y{S2,X2) - ^{X2)} 

- V(xi,x2,A)] 

X N2{dsi,ds2), 

kL 

f3{xi,X2,A) = ^Pi{xi,X2,A). 
1=1 

Define independent random variables "yi{xi,X2, A) on a different probability 
space such that they have the same distributions as Pi{xi,X2,A), and de- 
fine 'j{xi,X2, A) = J2i£,i 7i(xi, X2, A). Let and ip{£,) be the characteristic 
functions of {Lhy/^[P{xi,X2, A)-E{f3{xi,X2, A)}] and {Lhy/^[-f{xi,X2, A) - 
E{'y{xi,X2, A)}], respectively. 

We finish the proof in the following three steps: 

(i) mxi,X2,A) - E{P{xi,X2,A)}] - [P{xi,X2,A) - E0{xi,X2, 
A)}])^0; 

(ii) - m ^ 0; 

(iii) {Lh)y^[j{x,,X2,A) - E{j{xi,X2,A)}] Normal{0,z.2/i(0)cT'(xi, 

X2,A)}. 

To show (i), notice that, with — > oo, calculations as in Lemma A. 3 show 
that 

^ var{A(xi, X2, A)} = ^ i^^L~^h~^RKfi,iiO)WHxi,^2,A) + o(l)}, 
1=1 1=1 

(14) 

where fi,i{u) = J gi{u + t)gi{t) dt is the counterpart of fi{u), with gi{t) = 
g{t)I{t € I'i). Since g{-) is bounded away from both and oo, /ii(0) = 
j^,g\t) = 0{\r^) = 0{LP-^) and var{A(a;i,X2,A)} = 0(LP-2/i-i). 
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Observe that |/'| = Y!lti = x (LP - Li)/L L/W x {LP - Li)/L = 
l-L'i-P^l, and 



g{tydt = MO). 



(15) ^/,,i(0)=^ lg{t?dt= lg{tfdt^ [ 
i=i 1=1-' "'0 

Therefore, J2i=i var{/3j(2;i, X2, A)} = var{/3(3;i, 3:2, A)} + o{L^^h^^). Further 
but equivalent derivations show that J2i^j cov{Pi{xi,X2, A), (3j{xi,X2, A)} = 
0(L~^). The calculations here are similar to those in Lemma A. 3, except 
that the iy^j condition excluded terms like A22 through A27- Now we have 

kL 

var {/3 (xi , X2 , A) } = ^ var{/3j (xi , X2 , A) } + ^ cov{/3i (xi , rE2 , A) , /?j (xi , X2 , A) } 

1=1 i^j 

= var{/3(xi, X2, A)} + o(L"^/i~^). 

Similarly, one can show that 

cov{/?(xi,X2, A),/3(xi,X2, A)} = var{/3(xi,X2, A)} + o(L~^/i~^). 

Therefore, (L/i) var[{/3(xi, X2, A) — {/3(xi, X2, A)}] — > 0, and step (i) is estab- 
lished. To show (ii), we follow similar arguments that prove Theorem 2 (S2) 
in Guan, Sherman and Calvin [6]. Denote Ui = exp(/x(L/i)^/^[/3j(xi, X2, A) — 
E{f3i{xi,X2, A)}]), where / is the unit imaginary number. Then by defini- 

tion, m=E{uhlU^): H^)=uhiEm- 

Observing \E(Ui)\ < 1 for all Ui, we have 



(x) -^p{x)\ 

/kL 



< 



< 



/kr-l 



Eyli^^j -^yU u,jE{UkL) 

/kL \ /kL-1 \ 



+ 



/kr-l 



e[ n U,]E{UkL)-l[E(U,[ 



i=l 



i=l 



vt=l 



+ 



^kr-l 



1 = 1 
kL-1 



^ n - n E{Ui 



1=1 



< 



i=l 
/kL-1 



\E{Uk 



E\\{Ui]-E[ n uAE{UkL 



\i=l 

By induction, 



1=1 



k,-i 



+ 



/kr-l 



Ei n 



i=l 



kL-1 

n Em 

i=l 



v(x)i< J2 



E[l[uA-E[Y[uAEm^ 



yi=l 



3 

1 

i=l 
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/ i 



cov 



+1 



Observe that 0^=1 ^i+i -^([Oj ^j])" ^^i^ ^([aj+i, 6j+i])-measurable, 

respectively, with |ni=if^i| ^ 1 ™d < 1, and the index sets are at 

least L'' away. By Lemma A.l, 

i=i 

By om' choice of p and q, it is easy to check 1 — p — q6 < 0, and therefore 

\(j){x) - ^lJ{x)\ ^0. 

(iii) can be proved by applying Lyapounov's central limit theorem and by 
the fact that 

kL 

(Lh) var{7i {xi , X2, A) } ^ i/^/i (O)cr^ {xi , X2, A) , 

4 = 1 

which has been shown in (14) and (15). 

It remains to check Lyapounov's condition. By condition (9), 

^Ei}yi{xi,X2,A) - EH{xi,X2,A)}\^+^) 
1=1 



var 



{7(X1,X2,A)}](2+W2 



0{(L-l/l-l)(2+'7)/2} 



= 0(L-(i-P)'"'/2) 
The proof is thus complete. □ 



0. 



Lemma A. 5. Let /3(A) be the vector collecting all /3(xi,X2,A) for dis- 
tinct pairs of {xi,X2). Then, with all assumptions above, for A' ^ A, 



(Lh) 



1/2 



(3{A) - E{(3{A)} 
p{A') - E{P{A')} 



Normal jo, zv2/i(0)(^T^^)^, 



C(A,A') 
) 5](A0 



where 5](A) is the covariance matrix with the entry corresponding to cov{/3(xi, 
X2,A),P{x3, X4, A)} equal to Rk{M{xi,X2,X3,X4,A,A,0) + I{x2 = Xi)al x 

V(X1, 3:3,0) +I{XI = X'i)alV{x2,XA-,Q) + {Xl = X'i,X2 = X4)fT^} + /(A = 0) X 

-Ri^{A4(xi,X2,X3, 0:4, 0,0,0) + I{xi = Xi)alV{x2,X3, 0) + I{x2 = X3)alV{xi, 
a:;4,0)+/(xi =X4,X2 =a;3)o"^} andC{A,A') is the matrix with the entry cor- 
responding to cov{/3(rci, X2, A), /3(2;3, 0:4, A')} equal to I{A' = —A){M.{xi,X2., 
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X3,X4,A,-A,-A) + I{x2 = x^)alV{xi,Xi,Q) + I{xi = Xi)alV{x2,xz,Q) + 

I{xi =X4,a;2 = X3)<T^}. 

Proof. Using similar proofs as for Lemmas A. 3 and A. 4, we can show 
that any hnear combination Y^\=iCij3{xii,Xi2, Is) + X]i=i Ci/3(3;ji, Xi2, A') is 
asymptotically normal. By the Cramer-Wold device (Serfling [14]), the joint 
normality is established. □ 

Note. If A' = — A, the limiting distribution on the right-hand side 
is a degenerate multivariate normal distribution, because /3(xi,a;i,A) = 
A) for all xi. 

A. 2. Estimation with multiple subjects. Now suppose we have R sub- 
jects, and i? is a fixed number. Define 

Yr^ik{xj,Xl) = {Yrij - 'I'r{xj)}{Yrkl - "^rixi)}, 

a, (A) = L-^Y.T.Kh{^- ^r,^k), 
brixj,xi,A) = L'^^^Yr^ikixj,xi)Kh{A. - Ar{i,k)}, 

Pr{Xj,Xi,A) = br{Xj,Xi,A) - ( A) V(xj , X;, A) , 

Cr{Xj,A) = L'^J2Y.i^rij - '^riXj)}Kh{A - A^(i,/c)}. 

i k^i 

Further, define a(A) = <^r{^)-, b{xj,xi,A) = J2r brixj,xi, A), (3{xj,xi,A) = 
Y.^Prixj,xi,A) and Vo(xi, X2, A) = ^(xi, a;2, A)/a(A). Let Vo(A) and V(A) 
be the vectors collecting all Vo(xi,X2,A) and V(.xi,X2,A) for all distinct 
pairs of (xi,X2), respectively. 

Lemma A. 6. With the assumptions in Section 3, for A' ^ A, 



1/2/ Vo(A)-V(A)-aiV(2)(A)/iV2 
^ ^ I Vo(AO - V(A') - af^V(2)(A')/iV2 



Normal 



0,{i^Vi(0)} 



.1/ S(A) C(A,A') 
\C^{A,A') S(A') 



where V''^'*(A) is the vector collecting V^'^''''^^(xi,X2, A) for all distinct pairs 

of (X1,X2). 

Proof. Notice that 

Vo(xi,X2, A) - V(xi,X2, A) 
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R 



^{br{xi,X2,A) - ar{A)V{xi,X2,A)} 



Lr=l 



R 



= P{xi,X2,A)/a{A). 

Since subjects are independent, by Lemma A. 2, a(A) / {u^ Rfi{0)} 1. 
Also, by Lemma A.5, (i?"^L/i)^/2{/3(A)^, /3(A')^}'^ is asymptotically nor- 
mal with covariance matrix given in Lemma A.5. Thus, by Slutsky's theorem 
[14], 

1/2 r (3{A)/a{A)-E{fi{A)]/a{A) 
_f3{A')/a{A')-E{(3{A')}/a{A') 

S(A) C(A,A')' 



{RLh) 



Normal 



0,{i/2/i(0)}-i 



C^(A,A') S(A') 



Finally, by Lemma A.3, E{(3{xi,X2, A)} = Rv'^ fi{Q){V^^^^^'^^ (xi, X2, A)c7^/iV 
2 + o(/i2)},so that we have £;{^(xi,X2, A)}/a(A) =cj|-V(°'°'2)(xi,a;2,A)/iV2 + 
Op{h?). The Op{h?') term is eliminated by the assumption that Lh^ = 0(1). 
□ 

Lemma A. 7. With all the assumptions above, we have that 
V{xi,X2, A) = Vo(xi, X2, A) + Oj,(L-i/i-V2)_ 

Proof. Notice that 
V{xj,xuA) = Voixj.xuA) 



+ 



(16) 



Y,{Yr-j-^r{Xj)}Cr{xi,A) 

+ - ^r[xi))Cr{xj,A) 

r 

+ Y^i^r.j - ^riXj)}{Yr.l - ^r{xi)}ar{A) 



a-HA), 



Cr{xi,A) = L 



-1 



{Y{si,xi) - ^rixi)}Kh{A - (si - S2)}N2idsi,ds2). 



Using the expression above, it is easy to see that E{cr{xi, A)} = 0, and 
calculations as in Lemma A.3 show that 

var{crixi,A)} = L~^ I J J J Kh{A - {si - S2)}Kh{A - {s^ - s^)} 



X [V{xi,xi, (si - S3)} + I{si = S3)a^] 
X E{N2{dsi,ds2)N2{ds3,ds4)} 



0{v^L-^h-^' 
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On the other hand, Y^.j — ^r{xj) = J {yr{s,Xj) — "^r{s,Xj)}N{ds). It is 
easy to see that E{Yr.j — "^r{xj)} = 0, and that 



= v'^L? I V{xj,Xj, Lu)fi{u) du + i/L / {V{xj,Xj,0) + a'^}g{si) dsi 



By properties of Poisson processes, we have Nr/{vL) 1 a.s. Therefore, 
wehaveYr.j -"^rixj) =Op{L~^/'^), Cr.(xi,A) = Op{L~^/'^h~^/'^). By Lemma 
A.2, a^(A) = Op(l). Therefore, V(xi, xs. A) - Vo(xi, X2, A) = Op(L-i/i-i/2), 
completing the proof. □ 

Proof of Theorem 1. This is a direct result from Lemmas A. 6 and 
A.7. □ 

Proof of Theorem 2. For a fixed A / 0, when /i < |Aj, we have 
V(xi, rE2, A) = {V(xi, a;2. A) + V{xi^X2, — A)}/2. This equation is true auto- 
matically for A = 0. Therefore, the asymptotic distribution of V(A) is the 
same as that of {V(A) + V(-A)}/2, for any fixed A. 

For Ai ^ ±A2, by Theorem 1, {V(Ai), V(-Ai)}'^ and {V(A2), V(-A2)F 
are asymptotically independent, and the joint asymptotic normality of the 
four vectors can be established. Therefore V(Ai) and V(A2) are jointly 
asymptotic normal and asymptotically independent. It suffices to show that 
f](A) is the asymptotic covariance matrix of V(A). 

For A 7^ 0, apply the delta method to the joint asymptotic distribution 
of V(A) and V(— A); the following gives the asymptotic covariance between 
V(xi,X2,A) and V(x3,X4,A): 



(l/4)(i?L^)-i{^Vi(0)}-^i?x 

X {A4(a;i,a;2,a;3,X4, A, A,0) + A4(xi, X2, X3, X4, -A, -A, 0) 
+ 2Al(xi , X2, X3, X4, A, -A, -A) + 2/(x2 = X4)crf V(xi, X3, 0) 



Ww[Nr{Yr.j--^r{Xj)}] 




[V{Xj,Xj,(si -S2)} + /(S1 =S2)(tI] 

X {u'^g{si/L)g{s2/L)dsids2 + vg{si/ L)es^{ds2) dsi} 



Lf i{0) / V{xj,Xj,u)du + i'L{V{xj,Xj,0) + a'^} + o{L). 



+ 2I(xi = X3)cj£ V(X2, X4, 0) + 2/(xi = X3, X2 = X4)cr^ 
+ 2I(X2 = X3)cj£ V(xi, X4, 0) + 2/(xi = X4)(T£ V(X2, X3, 0) 

+ 2I(xi = X4, X2 = X3)a^}. 
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Note that M.{xi,X2, a^s, X4, —A, —A, 0) = ^A{xl,X2,x^s,X4, A, A, 0) by the sym- 
metry in the definition of J^{xi,X2,X3,X4,u,v,w). Next, for A = 0, we have 
V{xi,X2,0) = V(xi, X2, 0), and the asymptotic covariance between V(xi, X2, 0) 
and V(j;3,X4,0) is given in Theorem 1. The proof is completed. □ 

Proof of Corollary 1. The result follows from Theorem 2 and the 
delta method. To see this, note that, with the separable structure in (3), we 
hayeV{xi,X2,A) = G{xi,X2)p{A) eindV'^^'^'^\xi,X2,A) = G{xi,X2)p^^\A). 
By the delta method, the asymptotic mean of p{A) is 

E.,exJ:.,<.An^i^^2,A) + alvm^){xi,X2,A)hy2 + o,{h^^ 
E.,<x, {GixuX2) + alG{xi , xs)^^) (o)/,2/2 + o^(/,2)} 

= {p(A) + alp^'\A)h'/2 + o,ih')}/{l + aip(2)(o)/,2/2 + Op{h')} 
= {p{A) + alp^^\A)h'/2 + Op{h')} * {1 - (0)^2/2 + o^{h')} 

= piA) + {p(2)(A) - p(A)p(2)(o)}a|^/iV2 + o,{h^). 
The asymptotic variance of p{A) also follows from the delta method. □ 
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